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ABSTRACT 



A method and device to generate behavior for emulated 
visitors traversing an internet web site. The visitors may 
display behavior that is indistinguishable . from those of 
actual users, a subset of the actual users, or the behavior may 
be purely hypothetical, such as when a visitor acts without 
evidence of having made an intentional choice. TTie inven- 
tion tracks the actions of the visitors and develops reference 
distributions that may be compared to a sile*s usage distri- 
butions as obtained from actual visitors to the site. The 
reference distributions are then used to implement statistical 
estimation methods that measure relative information con- 
tent. The invention comprises a general implementation and 
a deterministic implementation. The general version may be 
applied to live production web sites, and the deterministic 
version is best suited to offline processing. 

71 Claims, 9 Drawing Sheets 



120 
) 



Web Site 
Entry Pages 
(URLs) 



(Historical Data 



122 



1^102 




02/25/2004, ^ST version: 1.4.1 



us 6,278,966 Bl 

Page 2 



OTHER PUBUCAnONS 

J.S, Park et al, "An Effective Hash Based Algorithm For 
Mining Association Rules", Proc. ACM-^IGMOD Coof. On 
Management of Data, San Jose, May 1994. 
Agrawal et al, "Parallel Minining Of Association Rules: 
Design, Implementation, And Experience", IEEE Transac- 
tion On Knowledge Data Engineering, vol. 8, No. 6, pp. 
962-969, Dec. 1996. 

Argrawal et al, "Fast Algorithms For Mining Association 
Rules", Proceedings of the 1994 VLDB Conference, pp. 
487-499, 1994. 

Agrawal et al, "Mining Association Rules Between Sets of 
Items In Urge Databases". Proc. 1993 ACM SIGMOD 
Conf. pp. 207-216, 1993. 

Piatetsky-^hapiro, Chapter 13 "Discovery, Analysis, And 
Presentation Of Strong Rules", from Knowledge Discovery 
in Databases, pp. 229-248, AAAI/M IT, Press, Menlo Park, 
CA 1991. 

Swami, "Research Report:Set-Orienled Mining For Asso- 
ciation Rules", IBM Research Division, RJ 9567 (83573 
Oct. 1993. 

Ludwig et al "Laboratory for Emulation and Study of 
Integrated and Coordinated Media Commimication", Proc. 
ACM Workshop on Frontiers in Computer Communications 
Technology, pp. 283-291, Aug. 1987.* 



Chen et al., "Data Mining for Path Traversal Pattens in a 
Web Environment", Proc. 16lh International Conf. on Dis- 
tributed Computing Systems, pp. 385-392, May 1996.* 

Chen et al., "Efficient Data Mining for Path Traversal 
Patterns^', IEEE Transactions on Knowledge and Data Engi- 
neering, vol. 10, Issue 2, pp. 209-211, Mar.-Apr. 1998.* 

Cooley et al., "Grouoing Web Page Preferences into Trans- 
actions for Mining World Wide Web Browsing Patterns", 
Proc. Knowledge and Data Engineering Exchange Work- 
shop, pp. 2-9, Nov. 1997.* 

Hellerstein et al., "ETE: A Customizable Aproach to Mea- 
suring End-to-end Response Times and Tlieir Components 
in Distributed Systems", Proc, IEEE 19th Inter, Conf. on 
Distributed Computing Systems, pp. 152-162, May 1999.* 

Schubert et al., ^Web Assessment-Measuring the Effective- 
ness of Electronic Commerce Sites Going Beyond Tradi- 
tional Marketing Paradigms", Proc. of the 32nd Annual 
Hawaii Inter, Conf. on Systems Sciences, pp, 1-10, Jan, 
1999.* 

Barra et al., "Symmetric Adaptive Customer Modeling in an 
Electronic Store", Proc. Third IEEE Symposium on Com- 
puters and Communications, pp, 348-352, Jul, 1998,* 

* cited by examiner 



02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 1 of 9 US 6,278,966 Bl 




02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 2 of 9 US 6,278,966 




Visitation 
Logs 



FIG. IB 



202 



Historical Data 
session logs 
•referral logs 



Flow 
Constraints 




Offline 
Emulator 



Emulated 
Visitation 
Logs 



^208 



206 



FIG. 2 



02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 3 of 9 



US 6,278,966 Bl 



C start ) ' 



302 



INITIALIZE 
Initialize Emulated 
Distributions 



EMULATE 

Randomly draw 

a number of 
Emulated Visitors 
from the 
Emulated 
Distributions. 

Submit the 
Emulated Visitors 
to the web site. 



30A 



306 



C End 



308 



3 



02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 4 of 9 



US 6,278,966 Bl 



INITIALIZE 

( Start 



A02 



Select a subset of the following distributions: 

Global Distributions: 

Clickstream Lifespan Distribution 
Clickstream Lifespan CDF 
Allowable Links List 
Global Link Preference Distribution 



Local Distributions: 

Link Preference Distribution by page 
Link Preference Distribution by link type 
Clickstream lifespan distribution by page 
Clickstream lifespan distribution by resource 
Session End probability by page 
Site Exit probability by page 



± 



Create the distributions selected above. 

Create tlie Entry Page Distribution. 
Specify the maximum clickstream length. 

Record all distributions created above in a 
table as "enabled". 



A06 



( End 



FIG. 4 



02/25/2004, EAST Version: 1.4,1 




02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 6 of 9 



US 6,278,966 Bl 



RAVERSE 

( start 



602 



60A 



Compile a list of 
Candidate Link choices 
available on the current 
page. 



606 




Yes 



603 



Compare 
Candidate Links 
with 

Allowable Link List. 



Remove any 
Candidate Links 
that are not on the 
Allowable Link List. 



CLICK 

Select a link from the 
list of Candidate Links. 
Traverse the link. 



610 



612 



No 



END 
OF 
SESSION? 



Yes 



- ( End 



515 



FIG. 6 



02/25/2004, EAST Version: 1.4.1 



U.S. Patent 



Aug. 21, 2001 



Sheet 7 of 9 



US 6,278,966 Bl 



i 



5 



S 



(ft 
o o 

o 
o > 

w 5 



O Ol 
C 

o » 
o 

8 c 

|i 
ii 



O 
CM 



M 
CM 



• 



> 



T3 
C 
LU 




02/25/2004, EAST Version: 1.4.1 




02/25/2004, EAST Version: 1.4.1 



U.S. Patent Aug. 21, 2001 sheet 9 of 9 US 6,278,966 Bl 





FIG. 9 



02/25/2004, EAST Version: 1.4.1 



us 6,2' 

1 

METHOD AND SYSTEM FOR EMULATING 
WEB SITE TRAFFIC TO IDENTIFY WEB 
SITE USAGE PATTERNS 

BACKGROUND OF THE INVENTION 

1, Field of the Invention 

The present invention relates to a system to simulate the 
behavior of visitors navigating an internet web site. More 
particularly, the invention concerns a generative model to 
simulate hypothetical traflSc over a web site, and to use this 
traffic in emulation of actual traffic observed at the web site. 

2. Description of the Related Art 

In internet web site (site) applications, database logs 
record the movement of traffic caused by visitors traversing 
a site. In medium to large sites, the amount of data that 
accumulates on a daily to weekly basis is immense. 
Commonly, this data contains a great deal of information 
about the behaviors of visitors to the web site; however, 
analyzing it using conventional statistical tools is prohibitive 
due to the sheer volume of data. 

Instead data mining tools may be used to analyze the data 
and to automatically "discover" interesting patterns and 
relationships within the data. Such data mining tools are 
association rule discovery methods such as those disclosed 
in R. Srikantet al., "Mining Generalized Association Rules," 
1995, Proceedings of the list VLDB Conference, Zurich, 
Switzerland, and R. Agrawal et al., "Fast Discovery of 
Association Rules," 1996, Advances in Knowledge Discov- 
ery and Data Mining, U. M. Fayyad et al., eds. AAAI 
Press/The MIT Press, Mcnlo Park, Calif., USA. These tj^es 
of association rules can be used to identify patterns in a 
transaction database, where a transaction is a visitation 
session that occurs when a user peruses a web site. A web 
site server records the actions of users to the site in a "web 
log" database. This database is "sessionized" by identifying 
sequences of actions that correspond to distinct visits. 
Applied to such a sessionized web log, association rules can 
be used to discover the presence of content usage patterns 
(traffic flow) over a web site. Such rules may deliver 
statements of the form "75% of visits of referrer A belong to 
segment B," or "45% of visitors to page A also visit page B." 

One problem that arises in the internet web site domain 
due to the sheer volume of data that can be generated by a 
site with heavy user traffic is that saving all this data for 
future reference can be prohibitively expensive. One way to 
reduce the size of the data is to compress it into a set of 
summary statistics. However, this requires considerable 
foresight in choosing the set of statistics and does not allow 
one to posit questions that are only apparent at a later date. 

Although the internet is relatively new and few inventions 
exist for application to the internet in general much less to 
web sites in particular, computer science, discrete 
mathematics, and graph theory provide significant guidance 
in modeling static graphs. Given a static and completely 
described web page, such models can be applied to estimate 
the traffic flow over such a site without need to resort to a 
generative model or probabilistic simulation. However, 
characteristics of present day web sites preclude the appli- 
cation of such classical graph theoretic tools. 

Present day web sites tend to be dynamic, not static, and 
cannot be completely described in acJvance. Web pages can 
be constructed dynamically, or links between pages can be 
created dynamicaUy, thereby yielding a dynamic cyclic 
graph structure. Even web sites that are relatively static in 
that their design — such as websites that are stable over a 
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Span of a few weeks and do not rely upon dynamic page 
creation or dynamic link creations — are extremely difficult 
or tedious to model using conventional graph modeling tools 
due 10 the sheer size of the connected graph and the special 

5 nature of visitor behavior. 

To overcome these difficulties, there is a pressing need for 
an invention that automates the step of "describing** a graph 
to a web site modeling tool, and that automaticaUy takes into 
account the special nature of web site users themselves such 

^0 that the model not only accounts for the topology of the web 
site but also accounts for regularities evident in user traffic. 
The invention should be capable of generating a distribution 
of visitor behavior that results if visitors demonstrate no 
preferences and were influenced mostly by the site topology. 

15 This emulated distribution could then be used as a reference 
distribution against which the distribution generated by 
actual users could be compared. 

Preferably, the user characteristics processed by such an 
invention should also be reducible into a smaU number of 
descriptive statistics that, along with web site topography, 
could be used to emulate user behavior and approximate 
summary statistics not anticipated at the time the original 
data was collected. This would allow the statistics to be 
applied to detennine "future" visitor behavior, such as how 
past users would behave today when navigating a site 
topology previously unavailable. 

SUMMARY OF THE INVENTION 

Broadly, the present invention concerns a method and 
apparatus for generating hypothetical web site traffic that 
simulates the behavior of actual web site users. Data Mining 
Association Rules may be applied to this simulated traffic 
and used to identify usage patterns for users of a web site, 
such as discussed in the U.S. patent application entitled 
"ASSOCIAHON RULE RANKER FOR WEB SITE EMU- 
LATION" by Steven Howard et al., assigned to the assignee 
of the current invention, incorporated by reference herein 
and being filed concurrently herewith. 

^ Further, the present invention includes a method to dis- 
count topology affected rules. For example, one may use the 
present invention Web Walk Emulator to generate the dis- 
tribution of visitor behavior that would result if visitors 
demonstrated no personal preferences and were influenced 

45 mostly by the site topology alone. This "emulated" distri- 
bution can then be used as a reference distribution against 
which to compare the distribution generated by actual users 
who display personal preferences, 
'llie present invention allows user characteristics to be 

50 compressed into a small number of descriptive statistics, 
which, along with the site topology, can be used to emulate 
visitor behavior at a later time. An example of this use is 
approximating novel summary statistics that were not antici- 
pated at the time the original data was being collected. 

55 In one embodiment, the invention may be implemented to 
provide a method to generate behavior for hypothetical 
visitors (visitors) traversing a site. This generated data 
emulates the behavior of actual users. The hypothetical 
visitors may display behavior that is indistinguishable from 

60 those of actual users, a subset of the actual users, or the 
behavior may be purely hypothetical, such as when a user 
acts without evidence of having made an intentional choice. 
The present invention tracks the actions of the visitors and 
develops reference distributions that may be compared to a 

65 site's usage distributions as obtained from actual visitors to 
the site. 'Hie reference distributions are then used in one 
embodiment of the invention to implement statistical esti- 
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malion methods that measure relative information content, 
for example, Kullback-Liebler Information Criterion or the 
Bayesian criteria. 

In another version of the method, the invention comprises 
a general implementation; another embodiment comprises a s 
deterministic implementation. The general version may be 
applied to live production web sites. The deterministic 
version is suited to offline processing and not burdening the 
active web site with additional traffic. In another 
embodiment, this version also exploits certain types of data lO 
in order to reduce the cost of its implementation. 

In another embodiment, the invention may be imple- 
mented to provide an apparatus for generating web site 
traffic that substantially emulates actual web site traffic. The 
apparams may include storage, a processor, and an emula- 
tion system comprising various hardware components and 
circuitry. 

In still another embodiment, the invention may be imple- 
mented to provide a signal-bearing medium tangibly 
embodying a program of machine-readable instructions 
executable by a digital data processing apparatus to perform 
method steps for generating web site traffic that emulates 
actual web site traffic. 

The invention affords its users with a number of distinct 
advantages. In either the general or the deterministic 
embodiments, the invention generates visitor behavior that 
results if visitors to a web site do not demonstrate prefer- 
ences but are influenced primarily by the topology of the 
web site alone. Another advantage is that user characteristics 
may be compressed into. a small number of descriptive 
statistics that, along with site topology, may be used to 
emulate web site user behavior that was not anticipated at 
the time the original data was gathered. A further advantage 
is that emulated behaviors may be used to perform trend 
analysis on visitors' future behaviors, such as how visitors 
today would behave on a site topology being proposed for 
future use. The present invention is flexible enough to allow 
user emulations for web site behavior ranging from true life 
to purely hypothetical situations. ^ 

The invention also provides a number of other advantages 
and benefits, which should be apparent from the following 
description of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

45 

The nature, objects, and advantages of the invention will 
become more apparent to those skilled in the art after 
considering the following detailed de.scription in connection 
with the accompanying drawings, in which like reference 
numerals designate like parts throughout, wherein: 

FIG. lA is a block diagram of the hardware components 
and interconnections of a digital signal processing system 
used in accordance with one embodiment of the present 
invention; 

FIG. IB is a block diagram for an online emulator of the 
hardware components and interconnections of a digital 
signal processing system used to emulate visitor traffic over 
a live web site in accordance with one embodiment of the 
present invention; 

FIG, 2 is a block diagram for an offline emulator used to 50 
emulate visitor traffic over an offline web site in accordance 
with one embodiment of the present invention; 

FIG. 3 is a flowchart of an operational sequence for 
simulating web site traffic and emulating visitor behavior in 
accordance with one embodiment of the present invention; 65 

FIG. 4 is a flowchart of an operational sequence for an 
embodiment of task 304 of FIG. 3 for initializing the 



emulation in accordance wth one embodiment of the 
present invention; 

FIG. 5A is a flowchart of the operational sequence for an 
embodiment of task 306 of FIG. 3 for carrying out the 
emulation process in accordance with one embodiment of 
the present invention, 

FIG. 5B is a flowchart of an operational sequence for an 
embodiment of task 504 shown in FIG. 5A for generating an 
emulated visit in accordance with one embodiment of the 
present invention; 

FIG. 6 is a flowchart of an operational sequence for an 
embodiment of task 520 FIG. 5B for an emulated visitors* 
traversal of a web site in accordance with one embodiment 
of the present invention; 

FIG. 7 is a flowchart of the operational sequence for an 
embodiment of task 610 of FIG. 6 for clicking on a link and 
traversing the link in accordance with one embodiment of 
the present invention; 

FIG. 8 is a flowchart of the operational sequence for an 
embodiment of task 612 of FIG. 6 for ending of an emulated 
session in accordance with one embodiment of the present 
invention; and 

FIG. 9 is an exemplary embodiment of a signal bearing 
medium in accordance with the invention. 

DETAILED DESCRIPTION 

llie nature, objects, and advantages of the invention will 
became more apparent to those skiUed in the art after 
considering the following detailed description in connection 
with the accompanying drawings. As mentioned above, the 
invention concerns a generative model for generating hypo- 
thetical web site traffic that emulates actual web site traffic 
behavior. 

These emulated behaviors can be used for a variety of 
applications such as performing trend analysis on visitor 
behaviors. The emulated behaviors are intended to be as 
realistic as possible, but may be applied to a situation that 
has not yet occurred, namely, how might past users behave 
today on a site having of a topology different than was 
available in the past. Simply put, a user emulation allows 
one to simulate web site usage behavior ranging from 
lifelike to purely hypothetical. For example, it might be 
shown: 

how traffic would distribute over the site if users showed 
no evidence of preference in their link selections (i.e., 
given a set of choices, they are equally likely to select 
any particular one); or 

how traffic would distribute over the site if users had 
slightly different preferences on a particular page. 
(Because users of a particular page can go on to visit an 
indefinite number of pages thereafter, and a slight local 
difference in preference can result in global changes in 
traffic over the entire site); or 

how the behavior of a set of known users can be reduced 
to a sufficient set of statistics (in particular, from which 
the aggregate behavior of the original users can be 
recovered); or 

how a known set of users would behave given a slight 
change in the web site topology. 

I. Hardware Components & Interconnections 

One aspect of the invention concerns a digital signal 
processing system 100 used to generate visitor traffic over 
web site, which may be generally represented by the various 
hardware components and interconnections shown in FIG. 1. 
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In FIG. l,lheinteraei system 100 as shown comprises two Despite the specific foregoing description, ordinarily 

parts, a first system 101 and a second system 103. The first skilled artisans (having the benefit of this disclosure) will 

system 101 may include a web site server 102 communica- recognize that the apparatus discussed above may be imple- 

tively connected via a web 106 to an internet service mented in a machine of different construction, without 

provider (ISP) 110 using communication channels 108 and 5 departing from the scope of the invention. As a specific 

109. Commonly, these types of communication channels are example, one of the components such as ISP UO may be 

fast-link channels. The server 102 may act as a host location eliminated; furthermore, the ISP 110 may be integral to the 

for data objects such as media or rnultimedia objects. In one j^^i^^^ 3 3ite ser^-er 102. 

embodmient the server 102 may be a mamframe computer Regardless of the configuration of the web site server 102, 

manufactured by the International Busmess Machines Cor- ^^^^ -^^^^^^^ j^^^ y^^^ ^ 

poration of Armook, N.Y., and may use an operating system toooloav 
sold under trademarks such as MVS. Or, the server 102 may 

be a Unix computer, or OS/2 server, or Windows NT server II. Web Site Characteristics 

or IBM RS/6000 530 workstation with 128 MB of main A. Web Site Topology 

memory running AIX 3.2.5. The server 102 may incorporate A web site essentially comprises a set of pages. The pages 

a database system, such as DB2, IMS, or ORACLE, or it are linked together allowing a visitor to move form one page 

may access data on files stored on a data storage medium to another page. This "linked" arrangement between pages 

such as a WORM or disk, e.g., a 2 GB SCSI, 3.5" drive, or constitutes a part of a web sites's topology. A set of pages 

tape. can contain or point to a variety of resources, including 

In another embodiment, the web site server 102 may 20 ^^^io^ scripts (an interpretable program that 

comprise one or more magnetic data storage disks com- can be executed in response to visitor actions), and "click- 

monly referred to as direct access storage devices (DASD). able" links to resources. Clickable refers to the ability of a 

As is well known in the art, the data objects may be stored web site visitor to traverse at least part of the web site by 

by the server 102 in various formats depending upon the "clicking" on a designated location and being linked to a 

type of media. 25 ^^^^^^'^ location or resource. For example, a clickable 

'Ilie ISP 110 may be connected to the second system 103 resource can result in the foUowing effects: 

comprising an end-user unit 116 via a communication media the visitor traverses the site topology to another page; 

112, commonly a slow-link channel, where the ISP 110 the current page is modified in some manner; 

controls the passage of information between the web site background processing invisible to the visitor is executed 

server 102 and the user unit 114. "Fast-link** and "slow- 30 (e.g., when the visitor clicks on an advertisement, a 

link", as mentioned above, refer to the relative speed with count is incremented in a database); or 

which the communication channels 108, 109, and 112 can background processing visible to the visitor is executed 

transfer a data object. In any case, the object transfer (e.g., when a visitor clicks on a button on an Entry 

capabilities of the fast-link channel generally exceed those porm, that form may be submitted to a database, 

of the slow- link channel, and one or both links may com- 35 followed by the presentation of new page view to the 

prise a line, bus, cable, electromagnetic link, microwave, visitor.) 

radio signal, or other wireless means for exchanging jhe present invention concerns clickable resources. In 

commands, media objects, and other information and data one embodiment, a page comprises itself (a page is itself a 

between the web site server 102, the ISP 110, and the user resource) and may include printers to additional resources 

unit 116. 40 (images, text, etc.) including zero or more clickable links to 

Among other features, the ISP 110 may include a fire wall other resources such as other pages, "as well as buttons and 

used as a means of reducing the risk of unwanted access to other interactive controls which control access to data or 

the user unit 114. Although the ISP 110 is pictured as a scripts. Each clickable link invokes a resource, and when 

separate device, the ISP may be integral to the user unit 114. clicked, logs a "hit" on that resource in the web log. A hit 

The ISP 110 may also include a transformer 111 that may be 45 indicates that a resource fitting the desired description has 

used to transform a media object and set and/or to implement been found. A single click can result in hits to a number of 

transfer parameters to facilitate efficient transfer of the resources, e.g., when a page is viewed, the resources asso- 

media object between the transformer 111 and the user unit ciated with that page log hits in the database. The web site 

114. In another embodiment, the ISP 110 and the transformer topology may be mapped as a connected graph that 

111 may be eliminated from the system 100, the ISP 110 may 50 describes the pages, their clickable links and their clickable 

be ehminated and the transformer 111 integrate into the web resource, as well as page content, for example, images, text, 

site server 102 or be included within the second system 103 etc. 

rather than the first system 101 as shown. B. Visitors 

The end user unit 114 may include a processing unit (not A "visit" — also referred to in this application as a 

shown), such as a microprocessor or other processing 55 "session" — is a single user's sequence of requests, such as 

machine, communicatively coupled to a storage unit. The pages viewed, while at a web site. Visitors may pursue a site 

storage unit may also include a fast-access memory and may by entering it via several possible entry points and traversing 

include nonvolatile storage (not shown). The fast- access the web site by clicking on clickable resources as discussed 

memory preferably comprises random access memory, and above, 

may be used to store the programming instructions executed 60 C. Web Logs 

by the processing unit during execution of a computer Web visitation logs record the actions of every visitor to 

program. The nonvolatile storage may comprise, for the web site, gathering historical data on who visits the site 

example, one or more magnetic data storage disks such as a and what they do there. This includes reports such as the 

"hard drive" or any other suitable storage device. Further, number of users per day and per hour, what times are most 

the end user unit 114 may include in one embodiment an 65 active, how much data is accessed from the site per time 

output module for outputting or displaying program status period and per visit, which pages are accessed most 

results on a graphic display, print device or storage medium. frequently, which files are downloaded most frequently. 
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details on where users come from geographically, what 1. Specifying Msitor Behavior 

browsers they use, and what computer platforms they own. Emulated behaviors may be described by rating dislribu- 

If "referral logs" are enabled, it can also be recorded where tions over a finite set of options. Designated behaviors may 

the user originates within the internet domain space, indi- be specified according to some prior order to attain a certain 

eating the previous URL that they viewed immediately prior 5 effect, such as making all emulated visits enter the site at a 

to entering the current site. Once a user leaves the site, it can particular page in order to evaluate how u-affic flows from 

no longer be tracked in the web visitation logs for that site. ^^at page throughout the rest of the site. To make emulated 

Id addition to using these conventional summary visitors smnUar to a set of actual users observed m the past 

statistics, the present invention applies data-driven statistical ^ f ^^if^' f^^ f^. 'y^'^J °° 1^°8^ ^ 

pattern discovery methods ("data mining") to sift through lo task 4(M of FIG 4, certam conduct 

. . „ - . c 1 .1. • actual site visits may be used. These descnptions, can be 

the data automaticaUy in search of unusual or othewise ^^^^^j^^ ^^^^ ^ 

mterestmg patterns, such a regularities, irregularities. Examples of these types of behaviors are described below, 

cooccurences. correlations, or trends. ^ 3 General Behavior (Aggregate Descriptions) 

111. Operation These descriptions describe the general emulated behav- 

.... . . ... ^5 ior of visitors overall. Typical examples shown in task 404 

In addition to the various hardware embodiments -^^^^^^^ ^^^^^ 

described above, a different aspect of the invention concerns ^ distribution: a visitor's entry page is the first 

a method for smiulating web site trafiSc. A general online ^ ,g i„ ^ 

version of the present invention is shown in HG. IB and a p^g^ ^^^^^ ^^ ^^^^^ jj^g„y visitors from 

general offlme version is shown m FIG. 2. 20 outside the site. The entry page distribution describes 

In no. IB, an online emulator 124 considers real time (jj^ g^jy pjggg selected by visitors distribute over 

data such as the available web site entry pages 120 and jg, j]i possible entry pages at the web site; 

historical data 122 to determine the movement preferences g^j, distribution: a visitor's exit page is the last page 

for a web site vBitor across a web site 116 Visitation logs ^ ^-^^ ^j^^^ j^^^j^ ^^^^ distribution 

118 are created from each visitors traversal of the site and 25 describes how exit pages distribute over the web site's 

may be used in ofQine emulation as discussed below. Using viewable pages- and 

this accumulated data the present invention applies a dickstream Hfespan distribulion that gives a distribution 

method to determine the movement preferences. ^^^^ ^^^^ ^y ^^^^^ ^^^^^ 

In FIG. 2, an offline emulator 206 uses historical data 202 average session 

such as session logs and referral logs to generate "emulated" 30 ^^hough numerous general behaviors can be emulated 

visitation logs 208. These emulated logs 208 compose • ^^^^^^ invention, not aU behavior is useful. For 

preference profiles for hypothetical web site visitors and example, one particular general behavior distribution that is 

other relevant information. The hypothetical visitor's pref- ^^^^ ^^^^^^ emulating actual visitors, but that is very 

erences are based upon an analysis of the historical data 202 generating hypothetical emulated visitors, is the 

and certain subjective preferences. Flow constraints 20 4 35 ^^^^^^ selection distribution," This gives the 

representing topology limitations mherent in a web site are distribution of link selections made by visitors over an 

also used by the offline emulator 206 to determine truly ^^^^^^ candidate links. A general example of this is 

preferential selections from mandated selections. The offlme ^^^^^^ distribution^orresponding to visitors which, 

emulator 206 generates these emulated visitation logs 208 ^^^^ ^ of candidate links from which they must choose, 

using a method as described in detail below. 40 ^^^^^^^ jjj^^jy ^^-^^ ^^^^ ^^^^^^ 

A. In General ^ . . . . ^ example of this is a distribution that weights link preference 

A descriptive overview of a smgle iteration of an emu- according to their posiUon in rank ordered Ust. A tangible 

lated visit for a general embodiment of the present invention ^^^^^^^ distribution is advertisement posi- 

is shown m RGS. 3-8. Refcrnng to FIG. 3, the method starts ^-^^^ (^e positioning, visitors are more likely 
in task 302 and a desired visitor behavior is specified during 45 ^^^^ advertisements placed near the top of the page 

initialization in task 304 using a set of probability distnbu- ^^^^ advertisements placed lower on the page, 

tions. In one embodiment, these distributions are based upon computation of this type of distribution from empir- 

data mined usmg the Association Rule Ranker for Web Site ^^^^ ^ straightforward. For example, to compute the 

Emulation invention referenced above. In another entry page distribution over a given set of sessions: 

embodiment, the distribution is based upon a program 50 ij^^. c., ,t,„ „r „«™ <.t u^^t ^r,t«, 

... 1, ^ J • Identify the set of entry pages having at least one entry 

assembled to reflect the desires of the person studymg the «f eLo;^«... 

traffic a Items Siv*^° sessions; 

'"in ekher ™a'se, whenever a choice needs to be made (e.g.. ^« °^ entry pages identified above, count the 

select an entry page, select a link, end the session) for an ^""^ ""^h ^" ""'^y 

emulated visitor, the method makes a selection according to 55 page, an 

a set of distributions. Thereafter, the emulated visitor is Normalize each count by the number of entry pages. 

passed through the site in task 306 where the emulated As another example, to compute the clickstream lifespan 

visitor enters the site at a particular page, and then traverses distnbution over a given set of sessions: 

the site by making choices according to the probability Identify the set of entry pages having at least one entry 

distributions specified in the previous step. These two steps 60 ^^^^ the given set of sessions; 

are repeated until sufiBcient coverage of the site is achieved Given the set of entry pages identified above, count the 

and a stop is invoked in step 308, ending the method. An number of sessions for which each served as an entry 

example of a stop might be to continue generating visits until page; and 

all reachable pages on the site are visited "x*' number of Normalize each count by the number of entry pages, 

times. 65 l.b. Specific Behavior (Conditional Descriptions) 

'llie initialization step 304 of FIG. 3 is shown in greater Some of the aggregate descriptions listed above can be 

detail in FIG. 4 where initialization begins in task 402. refined to describe visitor behavior to a "click-by-click" 
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resolutioD as also shown in task 404 of FIG. 4. For example, 
visitor behavior may be specified to depend upon a recent 
event in the visitor's session, for example, having viewed a 
particular page. Examples of such conditional descriptions, 
include but are not limited to: 5 
Link selection distribution (by page), where the distribu- 
tion of actual clicks over the set of clickable links on a 
particular page, averaged over all visits to that page, are 
determined; 

Oickstream lifespan distribution (by page), where the 
remaining clickstream lifespan distribution for visitors 
on a particular page are determined. For example, 
visitors to a financial services web site might typically 
leave shortly after viewing their account balances, 
whereas visitors to the login page will tend to have a 
relatively much higher remaining clickstream lifespan; 

Oickstream lifespan distribution (by resource), where the 
distribution of the remaining clickstream lifespan of a 
visitor that has just accessed a particular resource is 
measured. For example, most visitors to a financial 
services web site might leave shortly after placing a 
trade — thereby launching a script that executes a trans- 
action against their account — whereas most visitors 
that have just logged on — thereby executing a login ^ 
script — typically have relatively higher clickstream 
lifespans remaining; 

Session end probability (by page), where the conditional 
probability that a visitor to a particular page will end 
the session immediately thereafter is determined; 3Q 

Site exit probability (by page), where the conditional 
probability that a visitor to a particular page will exit 
the site immediately thereafter is determined; and 

Resource -dependent link selection distribution, where the 
propensity of the average visitor to chck on a particular 35 
category of resource out of several candidate categories 
is measured, for example, whether visitors tend to be 
more likely to click 00 a internal link than an adver- 
tisement. 

I.e. User-Segment Specific Behavior 40 

Any of the descriptions mentioned above can also be 
determined for a particular segment (subset) of a set of 
actual visitors. For example, the link selection distribution 
may be consistent with that of actual visitors overall on 
every page except for one, where it is instead consistent with 45 
the link selections observed for a particular segment of 
actual visitors. This allows the present invention to measure 
hypothetical situations such as "what if every visitor to this 
particular page acted in the same way as this particular 
segment of visitors?" so 

After a distribution has been selected and initiahzed, the 
respective distribution is created in task 406. Likewise, an 
Entry page distribution is created and, if desired, a maxi- 
mum clickstream length may be specified. Any distribution 
created in Usk 406 is recorded for use during the emulation 55 
process, such as that shown in FIG. 5A. The initialization 
ends in task 408 after desired disU-ibutions have been 
created. 

Ld. Emulate 

The emulation task 306 of FIG. 3 is shown in greater 60 
detail in FIG. 5 A and starts in task 502. An emulated 
visit — discussed in greater detail in FIG. 5B — is generated 
using the randomly drawn emulated visitors from the emu- 
lated distributions. Each selected emulated visitor is sub- 
milted to the web site in task 504 and the method continues 65 
in task 506 until all emulated visitors have been passed 
through the site. The emulation method ends in task 408. 
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The generation of an emulated visit as shown generally in 
task 504 is shown in greater detail in FIG. 5B. Generation 
begins in task 510 and an entry page is chosen at random in 
task 512 from the entry page distribution. The entry page is 
used to determine where the emulated visitor has entered a 
site. If the clickstream lifespan is enabled in task 514, a 
maximum clickstream length for the emulated visit is 
selected in task 516. In one embodiment, the length is 
selected at random from a clickstream lifespan distribution. 
In another embodiment, the length is chosen as desired by 
the user studying the site. This clickstream lifespan may be 
used to hmited the total "clicks" to be exercised in traversing 
a site when using one version of the invention. 

Regardless of whether or not a clickstream lifespan is 
used, the emulated visitor enters the site at the selected entry 
page in task 518. The visitor traverses the site in task 
520 — traversing being shown in greater detail in FIG. 6 
starting with task 602 — and the generation of the emulated 
visit ends in task 522. 

In FIG. 6, traversing comprises compiling a list of links 
available on the current web page, referred to as candidate 
links, in task 604, These links may be restricted in avail- 
ability by having the allowable links distribution enabled as 
shown in task 606. If this distribution is enabled, then the 
available candidate Unks are compared with the allowable 
links in task 608. Any candidate link that is not also an 
allowable link is removed from further consideration. At 
random, an available link is selected and the link is traversed 
in task 610. If further available links remain to be traversed 
in task 612, then the traversal of FIG, 6 is repeated for each 
available link until the session ends in task 616 and as 
discussed below with respect to FIG. 8. 

Selection of a link or CLICK as shown in task 610 of FIG. 
6 is shown in greater detail in FIG. 7. CLICK begins at task 
702 and it is determined whether or not link preference 
distribution by link type has been enabled in task 704. If it 
has not been enabled, the method continues with task 708. 
Otherwise, available links — also referred to as candidate 
links — are separated, for example, by internal links and 
external links in task 706. A weight may be assigned to each 
candidate depending upon the distribution, where, for 
example, a weight might refer to preferring one link over 
another link. In another embodiment, the weights may be 
determined using the data mining association rules refer- 
enced herein. Any candidates having a predetermined 
weight, or within a preassigned weight range, is removed 
from the available candidate links. 

In task 708, if link preference distribution by page is not 
enabled, then the method continues in task 710. If the 
"by-page" distribution is enabled, a link preference distri- 
bution for the current page is retrieved. If not found, such 
distribution may be generated. Similarly to the sorting 
discussed with respect to weighing in task 706, candidate 
links are sorted and weighted according to this link prefer- 
ence distribution in task 714. The method continues in task 
718. 

However, in task 710, if global link preference distribu- 
tion is enabled, candidate links are sorted based upon their 
respective positioning on the page. A global link preference 
distribution is retrieved, and the candidates are weighted 
according to this distribution. The method continues in task 
718. But if the global link distribution was not enabled, 
candidate links are weighted in task 712 according to a 
uniform distribution selected by the person studying the web 
site, for example, where each candidate is equally likely. In 
another embodiment, the distribution may be generated 
based upon predetermined criteria. 
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Regardless of whether one, some, all or none of the vi. For every possible (exit page, clickstream length) pair, 
distributions of tasks 708, 710, and 712 are enabled, an specify the probability that an emulated visitor entering 

available link is selected at random in task 718 according to that page and having a particular clickstream length 

their respective weighing. The link is then traversed in task will exit at that page. When the emulated visitor 

720 and CLICK ends in task 722. 5 encounters a candidate exit page having a particular 

FIG. 8 shows one method for determining if the end of a clickstream length, end the session according to the 

session has been reached in task 612 of FIG. 6. If the session probability associated with that (exit page, clickstream 

length equals the global maximum, the session ends in task length) pair; or 

810. Global maximum refers to, in one embodiment, to vii. Specify the emulated visitors* clickstream lifespan 
running out of links to click. It may also refer to other global cumulative distribution function (CDF) over the set of 

limitations, such as relating to time or space. If the session allowable clickstream lengths, given that at each chck- 

length does not equal the global maximum, and if click- stream length there is the probability that the session 

stream lifespan distribution is not enabled in task 806, and ^| ^nj. At each click, end the session probabilistically 

if local clickstream lifespan distributions are enabled in task accordine to this CDF 

812 a choice is made whether to end the session via a 4 g.^^,, veRion ' 

random draw based upon the most relevant local distribution . 4 - j u ct^^o -t o i_ c 

■ , t o^A . *i. ' ^ * 1 * As mentioned above, FIGS. 3-8 show a sequence of 

m task 814 or to contmue with the session. The most relevant j . ^ . . • i_ .i. r .2 

dLstribulion may be any local distribution for the page being "^^^^ ^*^P^ lUustratmg the method aspects of the present 

studied, or for the site being studied. If the secession ends in "^vention^ Readers familiar with the general methodology 

task 816, the session is over in task 810. Otherwise, the associated with monle carlo simulations, random walk 
session continues in task 818. 20 simulations, stochastic dynamical simulations, or generative 

If no local clickstream lifespan distribution is enabled in models of probabilistic processes will readily understand the 

task 812, the session also continues in task 818. Similarly, if following detailed descriptions. And for further ease of 

the session length does not equal the session maximum explanation, but without any limitation intended thereby, the 

length, the session continues in task 818. examples of FIGS. 3-8 are described in the context of the 

To assist in further understanding the present invention, 25 internet system 100 described above, 
additional discussion follows interlaced with various In many applications (e.g., statistical physics, molecular 

examples comprising possible applications for the invention. modeling, physical control systems, operations research) 

2. Simulating Actual Visitor Behavior estimating the state probability distributions and state tran- 
The present invention uses a set of behavioral statistics to sition probabilities of a probabilitic process is desirable. The 

simulate visitor behavior, generating "visitors" that exhibit process may be well known at some level, yet despite this it 
traffic flow descriptions consistent with those caused by niay be difficult or impossible to compute such measure- 
actual visitors that traverse a site. The descriptions ments analytically due to the complexity of the graph 
(distributions) discussed above with respect to FIG. 4 are describing the system. Fortunately, numerical methods may 
only some of the most generally applicable. Additions to be used to model such systems. A web site is such a system, 
these examples may lend even more realism to the emulation Further, many web sites cannot be described by a static 
process that could be customized to the characteristics of a connectivity graph because of their dynamic construction, 
particular web site topology or customized to the character- Monte Carlo methods— methods used to obtain an 
istics of a particular set of known visitors. These additional approximate solution to a numerical problem by the use of 
examples are selected by the user of the present invention as random numbers— may be used for investigating the behav- 
rcquired for creating a desired simulated behavior. ^ ior of complex, nonlinear, and even dynamic stochastic 

3. Emulating Hypothetical Visitors systems like a dynamic web site. In the present invention. 
Further, using the present invention, traffic flow statistics emulated visitors as defined in a problem start by making 

can be obtained for hypothetical visitors that have never decisions much like their real-life counterparts, that is, the 
been encountered in actual site traffic by specifying the method of the invention selects each decision for an emu- 
emulated distributions to be applied to a web site. For lated visitor based upon the distributions discussed above, 
example, if a user of the invention wants to set the entry page Decisions on actions to take are based either on probabilities 
distribution range from "lifelike" to "hypothetical," the user computed from actual web site traffic data, on resulLs of 
could choose to: learning models, or on subjective expectations gleaned from 

i. Select randomly according to the empirical distribution observational experience. TTiese decisions include selecting 
obtained over a set of actual visitors; which page to use to enter a site, which hyperlinks to select 

ii. Select randomly according to a uniform distribution in traversing the site, whether or not to wholly ignore certain 
over a finite set of entry pages; or classes of hyperlinks — such as help and support links — and 

iii. Set to a particular single entry page which has never when to end the visitation session, either by stopping at a 
before served as an entry page for actual visitors. certain location or by exiting the site. 

If the user wanted to regulate the emulated visitors' 55 These probabilities can be drawn from aggregate statistics 

clickstream lifespan, the user could choose to: averaged over the entire site, local statistics conditioned on 

iv. At the "birth" of an emulated visitor, choose a number a particular page, resources or other specific location within 
at random according to the empirical clickstream the site topology, or on "markov" probabilities computed 
lifespan distribution obtained over a set of actual users, over sequences or chains within the site topology stmcture. 
and leave this number fixed through the session. When 60 *^oe such method comprising one embodiment of the present 
the emulated visitor's session length equals this invention represented in pseudo code follows: 

number, the session would end. 1- Parameterize Entry Page Distribution; 

V. For every possible exit page, specify the probability that 2. Parameterize visitation stopping rules 
the emulated visitor exits at that page. When the to avoid endless or lengthy loops, 
emulated visitor encounters a candidate exit page, end 65 to regulate visitation lifespan; 
the session according to the probability associated with 3. Parameterize topology traversal decision mles 
that page; e.g., page wise like preference Distribution; and 
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4. Parameterize simulation stopping rules 
e.g., detect when sufficient coverage of the site has been 
attained. 

e.g., detect when sufficiently man visits have been gen- 
erated. 



Generate emulated \Tsilors, 
Submit each visitor to the site, 
And repeat, 

Until a stopping rule is satisfied. 



10 



5. While (simulation stopping rule indicates more pro- 
cessing is necessary) 



5a. Choose an entry page 

5b. Submit an emulated visit to the entry page 

5c. While (visitation stopping rules indicate that emulated visit 

can continue) 

{ 

Assemble a list of available clickable items. 

Identify a subset of this list as condidate click 25 

options. 

Weight each candidate click option according to a 
protebility distribution. 

Select a candidate click option at random accoixling 
to this distribution. 

"Hit" the resource identified by the selected click 30 
option. 



//-- 



// COMMENfTARY: 

// At this point the web log will record a hit on this 
// resource, as well as on any other resources that arc 
// hit as a side cfi"cct. ^5 
// This hit may result in a new page view, 
// or, it may take the emulated visit ofl&ite, thereby 
// ending the session. 

// If the click selection takes the emulated visit ofif- 
// site, exit this white loop. 
}// end of s'isitation while loop 
} // end of simulation while loop. 



40 



This embodiment of one method of the present invention 
is a general purpose implementation that may be applied, for 
example, to a live production site. Therefore, emulated visits 45 
can experience exactly the same conditions presented to 
actual visitors. The method is also probabilistically "accu- 
rate" to an arbitrary degree of precision, meaning that the 
behavior of actual visitors can be generated to any degree of 
realism by increasing the complexity of the simulation. so 

The general method may be applied to a replicated 
version of a web site, resulting in a simulation that does not 
intrude on the live production site. Other benefits of the 
general method are that emulated traffic experiences "live" 
web conditions, and that all links available to actual visitors 55 
are accessible to emulated visitors. 

Further the general method has general applicability: 
historical logs for reconstructing the site topology are not 
required (e.g., referral logs), traffic analysis can begin imme- 
diately; and, the accuracy of the method does not depend 60 
upon the quantity and quality of historical data. 

The next section presents another embodiment of the 
present invention for a specific implementation specially 
suited to offline simulation. This method exploits some of 
the special characteristics of the oflline situation and also 65 
employs some approximations of the probability distribu- 
tions employed by the general method. 



B. Deterministic Version 

A web site can be simulated ofiQine given sufficient types 
and amoimts of historical information drawn from actual 
visitations. The type of historical information required may 
include sessionized web logs (activity logs parsed into 
sessions) or referral logs (identifying for each visitor's 
activity the immediately prior activity for that visitor). 
Referral logs are used to allow the deterministic version of 
the method to reconstruct the topology that is traversed by 
a particular session. Further, the deterministic version 
includes additional benefits over the general purpose "live" 
version: the deterministic version is less intrusive because 
no traffic is sent to an active site; the "emulated" web site is 
fully controllable and can be manipulated at will whereas the 
"live" web site, in general, cannot; and, the emulated site 
allows computational shortcuts to be applied to make a site 
more efficient when it is placed on line. 

One benefit of the offline version is computational effi- 
ciency. For example, the "monte carlo" namre of the general 
method is sacrificed in exchange for a method that is 
deterministic yet which approximates the probability distri- 
butions employed in the general version. Rather than draw- 
ing the parameters for an emulated visit at random from a 
probability distribution, a parameter for the emulated visit is 
specified exactly by drawing it from an empirical sample. 
Another major approximation is obtained by utilizing a very 
simple stopping rule for determining when to end the 
simulation. Finally, the web site itself is not active during the 
simulation; instead, traversal of its topology is emulated by 
traversing records in a database. 

Below are the method steps for one embodiment of the 
deterministic version of the present invention, given the 
session logs for a set of visits. 

Step 1. Initialize 

IA. Rank order the sessions in the given session logs. 

IB. Set m to some finite constant integer. 

Step 2. For each session ("actual visit") in the given 
session logs: 

2A. Create an emulated visit: 
initialize the entry page to that of the actual visit, 
initialize the maximum clickstream lifespan to the 
lifespan of the actual visit. 
2B. Pass the emulated visit through the web site, 

all other actions emulated during the visit are deter- 
mined probabilistically as in the General Method. 
Step 3. Repeat Step 2 m times. 

If the actual visits number 111,000 then setting m=5 will 
result in 555,000 emulated visits. Because random choice 
can be exhibited during an emulated session (only the entry 
point and clickstream lifespan are determined explicitly 
from an actual session) multiple emulated session deter- 
mined by an actual session can exhibit much different 
behavior. For example, the specification of the Link Choice 
Distribution can be accomplished in several ways, just as in 
the general method. 

In one embodiment, the Link Choice Distribution may be 
based upon an equal likelihood determination, where, given 
a set of link options from which to choose, any particular 
option with equal likelihood is selected. This is equivalent to 
applying a uniform distribution to the set of choices, then 
clicking on a link according to this (uniform) distribution. 

In another embodiment, the Link Choice Distribution may 
be based upon a modified equal likelihood determination 
where a uniform distribution used to accomplish an equal 
likelihood link preference is replaced with another discrete 
distribution. For example, some subset of links may be 
assigned a 0 probability, and a uniform dLstribution applied 
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to the remaining links. Or, it could be replaced by an 
empirical distribution determined by actual users. 

Other embodiments of the methods described above and 
their functionality are attainable by taking hybrids of the 
Deterministic and General versions. For example, the Deter- 5 
ministic version can be enhanced lo allow any degree of 
realism just as in the General version, including the entry 
page and cHckstream lifespan. Additional embodiments can 
be readily developed by those schooled in the art based upon 
the above discussion and in light of the specification read as 30 
a whole. 

IV. Signal-Bearing Media 

In the context of FIGS. 1-2, such a method may be 
implemented, for example, by operating the internet system 15 
100, as embodied by a digital data processing apparatus fii^t 
system 101, to execute a sequence of machine-readable 
instructions. These instructions may reside in various types 
of signal-bearing media. In this respect, one aspect of the 
present invention concerns a programmed product, compris- 20 
ing signal-bearing media tangibly embodying a program of 
machine-readable instructions executable by a digital data 
processor to perform a method to generate visitor traffic over 
a web site. 

This signal-bearing media may comprise, for example, 
RAM (not shown) contained within the web server 102. 
Altematively, the instructions may be contained in another 
signal-bearing media, such as a magnetic data storage dis- 
kette 900 (FIG. 9), directly or indirectly accessible by the 
web server 102 or the ISP 110. Whether contained in the web ^° 
server 102 or elsewhere, the instructions may be stored on 
a variety of machine-readable data storage media, such as 
DASD storage (e.g., a conventional "hard drive" or a RAID 
array), magnetic tape, electronic read-only memory (e.g., 
ROM, EPROM, or EEPROM), an optical storage device 
(e.g., CD-ROM, WORM, DVD, digital optical tape), paper 
"punch" cards, or other suitable signal-bearing media 
including transmission media such as digital and analog and 
communication links and wireless. In an illustrative embodi- 
ment of the invention, the machine-readable instructions ^ 
may comprise software object code, compiled from a lan- 
guage such as C, C, C*^, etc. 

V. Other Embodiments 

While the foregoing disclosure shows a number of illus- 
trative embodiments of the invention, it will be apparent to 
those skilled in the art that various changes and modifica- 
tions can be made herein without departing from the scope 
of the invention as defined by the appended claims. 
Furthermore, although elements of the invention may be 
described or claimed in the singular, the plural is contem- 
plated unless limitation to the singular is explicitly stated. 

What is claimed is: 

1. A method for emulating behavior of web site visitors 
for producing web site trend analysis data, the method 
comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 
during a traversal of a web site and selecting a subset 
of a distribution to be emulated; 

creating an emulated distribution including an entry page 
distribution, the emulated distribution emulating distri- 
bution and transition probabilities for selected actions 
of an emulated visitor; 

specifying a maximum clicksiream length; 

storing the emulated distributions; 
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randomly selecting a number of visitors firom the emu- 
lated distribution; 
traversing a web site using the randomly selected emulated 
visitors; and 

ending the emulation session. 

2. A method for emulating behavior of web site visitors 
for producing web site trend analysis data, the method 
comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 
during a traversal of a web site; 

said emulated traversal of the web site by a visitor 
comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum chckstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan dis- 
tribution is enabled; 
entering the web site at the selected entry page; and 
traversing the web site; and 
randomly selecting a number of visitors from the emu- 
lated distribution; traversing a web site using the ran- 
domly selected emulated visitors; and 
ending the emulation session. 

3. The method recited in claim 2, traversing the web site 
comprising: 

generating a list of candidate links, a candidate link being 
a link choice available to a visitor on a page of the web 
site; 

selecting a candidate link from the list; and 
traversing the candidate link. 

4. The method recited in claim 3, the method further 
comprising selecting only candidate links that are allowable 
links. 

5. The method recited in claim 4, traversing a candidate 
link comprising: 

enabling Hnk type preference distribution; 
sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 

weighing candidate links by a uniform distribution where 
each candidate is equally as likely; 

selecting allowable candidate links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 
allowable weighed candidate links. 

6. A method for emulating behavior of web site visitors 
for producing trend analysis data, the method comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 
during a traversal of a web site; 

randomly selecting a number of visitors from the emu- 
lated distribution; 

traversing a web site using the randomly selected emu- 
lated visitors; the emulated traversal of the web site by 
a visitor comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream lifespan 
distribution if the clickstream lifespan distribution is 
enabled; 
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entering the web site at the selected entry page; 
traversing the web site comprising generating a list of 

candidate links, a candidate link being a link choice 

available to a visitor on a page of the web site; 
selecting a candidate link from the list; 5 
traversing the candidate link selecting only candidate 

links that are allowable links by enabling page 

preference distribution; 

retrieving a link preference distribution for a current 
page; lO 

weighing each candidate link using link preference 
distribution for the current page, and ignoring any 
candidate link with a specified weight; and 

selecting at random an allowable candidate link from 
the allowable weighted candidate links; and is 
ending the emulation session. 

7. The method recited in claim 4, traversing a candidate 
link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 
erence distribution; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 25 

8. llie method recited in claim 4, traversing a candidate 
link comprising: 

if hnk type preference distribution is enabled, then: 
sorting candidate links by type; 
weighing each candidate link using link preference 

distribution by link type, and ignoring any candidate 

link with a specified weight; 
weighing candidate hnks by a uniform distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed 

candidate links; and 
selecting at random an allowable candidate link from 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference dLstribution for a current 

page is selected, then: 
weighing each candidate link using link preference 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate hnks. 

9. The method recited in claim 5, ending the emulation 
session comprising: 

ending the emulation session if all allowable weighted 
link candidates have been traversed. 

10. The method recited in claim 6 ending the emulation 
session comprising: 

if the clickstream lifespan distribution is enabled, ending 50 
the emulation session; 

otherwise, if a local clickstream lifespan distribution is 
enabled, choosing randomly whether or not to end the 
emulation session based upon a most relevant local 
distribution. 65 

11. llie method recited in claim 7, ending the emulation 
session comprising: 



if an emulation session length has reached the global 
maximum, ending the session. 

12. The method recited in claim 8, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream lifespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 

if an emulation session length has reached the global 
maximum, ending the session. 

13. The method of claim 1, the emulated traversal of a 
web site by a visitor comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan distri- 
bution is enabled; 

entering the web site at the selected entry page; and 

traversing the web site. 

14. The method recited in claim 13, traversing a web site 
comprising: 

generating a list of candidate links, where a candidate link 

LS a link choice available on a page of the web site; 
selecting a candidate link from the list; and 
traversing the candidate link. 

15. The method recited in claim 14, the method further 
comprising selecting only candidate links that are allowable 
links. 

16. The method recited in claim 15, traversing a candidate 
link comprising: 

enabling link type preference distribution; 
sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 

weighing candidate links by a uniform distribution where 
each candidate is equally as likely; 

selecting allowable candidate links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 
allowable weighed candidate links. 

17. The method recited in claim 15, traversing a candidate 
hnk comprising: 

enabling page preference distribution; 

retrieving a link preference di.stribution for a current page; 

weighing each candidate link using link preference dis- 
tribution for the current page, and ignoring any candi- 
date link with a specified weight; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 

18. The method recited in claim 15, traversing a candidate 
link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 
erence distribution; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 
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19. The method recited in claim 15, traversing a candidate 
link comprising: 

if link type preference distribution is enabled, then: 
sorting candidate Links by type; 

weighing each candidate link using link preference ^ 

distribution by link type, and ignoring any candidate 

link with a specified weight; 
weighing candidate links by a uniform distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed ^0 

candidate links; and 
selecting at random an allowable candidate link from 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference distribution for a current 

page is selected, then: 
weighing each candidate link using link preference 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links. 

20. The method recited in claim 16, ending the emulation 
session comprising: 

ending the emulation session if all allowable weighted 
link candidates have been traversed. 

21. The method recited in claim 17, ending the emulation 
session comprising: 

if the clickstream lifespan distribution is enabled, ending 35 
the emulation session; 

otherwise, if a local clickstream lifespan distribution is 
enabled, choosing randomly whether or not to end the 
emulation session based upon a most relevant local 
distribution. 40 

22. The method recited in claim 18, ending the emulation 
session comprising: 

if an emulation session length has reached the global 
maximum, ending the session. 

23. The method recited in claim 19, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream hfespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 

if an emulation session length has reached the global 
maximum, ending the session. 

24. A signal-bearing medium tangibly embodying a pro- 
gram of machine-readable instructions executable by a digi- go 
tal processing apparatus to perform a method for emulating 
behavior of a web site visitor for producing web site trend 
analysis data, the method comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 65 
during an emulated traversal of a web site and selecting 
a subset of a di.stribution to be emulated; 
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creating an emulated distribution including an entry page 
distribution, the emulated distribution emulating distri- 
bution and transition probabilities for selected actions 
of an emulated visitor; 

specifying a maximum clickstream length; and 

storing the emulated distribution; 

randomly selecting a number of emulated visitors from 
the emulated distribution; 

traversing the web site using the randomly selected emu- 
lated visitors; and 

ending the emulation session. 

25. A signal -bearing medium tangibly embodying a pro- 
gram of machine-readable instructions executable by a digi- 
tal processing apparaUis to perform a method for emulating 
behavior of a web site visitor for producing web site trend 
analysis data, the method comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 
during an emulated traversal of a web site; 

said emulated traversal of the web site by a visitor 
comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan dis- 
tribution is enabled; 
entering the web site at the selected entry page; and 
traversing the web site; 
randomly selecting a number of emulated visitors from 

the emulated distribution; 
traversing the web site using the randomly selected emu- 
lated visitors; and 
ending the emulation session. 

26. The medium recited in claim 25, traversing the web 
site comprising: 

generating a list of candidate links, a candidate link being 
a link choice available to a visitor on a page of the web 
site; 

selecting a candidate link from the list; and 
traversing the candidate link. 

27. The medium recited in claim 26, the method further 
comprising selecting only candidate links that are allowable 
links. 

28. The medium recited in claim 27, traversing a candi- 
date link comprising: 

enabling link type preference distribution; 
sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 

weighing candidate links by a imiform distribution where 
each candidate is equally as likely; 

selecting allowable candidate Links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 
allowable weighed candidate links. 

29. A signal-bearing medium tangibly embodying a pro- 
gram of machine-readable instructions executable by a digi- 
tal processing apparatus to perform a method for emulating 
behavior of a web site visitor for producing web site trend 
analysis data, the method comprising: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by visitors 
during an emulated traversal of a web site; 
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randomly selecting a number of emulated visitors from 
the emulated distribution; 

traversing the web site using the randomly selected emu- 
lated visitors the emulated traversal of the web site by 
a visitor comprising: 5 
selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a cUckstream length from a clickstream 
lifespan distribution if the clickstream lifespan dis- lO 
tribution is enabled; 
entering the web site at the selected entry page; 
traversing the web site comprising: 
generating a list of candidate links a candidate hnk 
being a Hnk choice available to a visitor on a page ^5 
of the web site; 
selecting a candidate link from the list; 
traversing the candidate link comprising enabling 

page preference distribution; 
retrieving a link preference distribution for a current 20 
page; 

weighing each candidate link using link preference distri- 
bution for the current page, and ignoring any candidate link 
with a specified weight; and 

selecting at random an allowable candidate link from the 
' allowable weighted candidate links; 
selecting only candidate links that are allowable links; and 
ending the emulation session. 

30. The medium recited in claim 27, traversing a candi- 
date link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 
erence distribution; and 35 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 

31. 'Ilie medium recited in claim 27, traversing a candi- 
date hnk comprising: 

if link type preference distribution is enabled, then: 
sorting candidate links by type; 
weighing each candidate link using link preference 

distribution by link type, and ignoring any candidate 

link with a specified weight; 
weighing candidate links by a unifonm distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed 

candidate links; and 
selecting at random an allowable candidate link from 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference distribution for a current 

page is selected, then: 
weighing each candidate link using Unk preference 55 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 60 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links. 65 

32. 'llie medium recited in claim 28, ending the emulation 
session comprising: 
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ending the emulation session if aU allowable weighted 
link candidates have been traversed. 

33. The medium recited in claim 29, ending the emulation 
session comprising: 

if the clickstream lifespan distribution is enabled, ending 

the emulation session; 
otherwise, if a local clickstream lifespan distribution is 

enabled, choosing randomly whether or not to end the 

emulation session based upon a most relevant local 

distribution. 

34. The medium recited in claim 30 ending the emulation 
session comprising: 

if an emulation session length has reached the global 
maximum, ending the session. 

35. The medium recited in claim 31, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream lifespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 

if an emulation session length has reached the global 
maximum, ending the session. 

36. The medium of claim 24, the emulated traversal of a 
web site by a visitor comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
hfespan distribution if the chckstream lifespan distri- 
bution is enabled; 

entering the web site at the selected entry page; and 

traversing the web site. 

37. The medium recited in claim 36, traversing a web site 
comprising: 

generating a list of candidate links, where a candidate link 

is a link choice available on a page of the web site; 
selecting a candidate hnk from the Ust; and 
traversing the candidate link. 

38. The medium recited in claim 37, the method further 
comprising selecting only candidate links that are allowable 
links. 

39. The medium recited in claim 38, traversing a candi- 
date link comprising: 

enabling link type preference distribution; 
sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 

weighing candidate links by a uniform distribution where 
each candidate is equally as likely; 

selecting allowable candidate links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 
allowable weighed candidate links. 

40. The medium recited in claim 39, traversing a candi- 
date link comprising: 

enabling page preference distribution; 

retrieving a link preference distribution for a current page; 
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weighing each candidate link using link preference dis- 
tribulion for the current page, and ignoring any candi- 
date link with a specified weight; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 

41. The medium recited in claim 39, traversing a candi- 
date link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 
erence distribution; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 

42. The medium recited in claim 38, traversing a candi- 
date link comprising: 

if link type preference distribution is enabled, then: 
sorting candidate links by type; 
weighing each candidate link using link preference 

distribution by Unk type, and ignoring any candidate 

link with a specified weight; 
weighing candidate links by a uniform distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed 

candidate links; and 
selecting at random an allowable candidate link from 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference distribution for a current 

page is selected, then: 
weighing each candidate link using link preference 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links. 

43. The medium recited in claim 39, ending the emulation 
session comprising: 

ending the emulation session if all allowable weighted 
link candidates have been traversed. 

44. The medium recited in claim 40, ending the emulation 
session comprising: 

if the clickslream lifespan distribution is enabled, ending 

the emulation session; 
otherwise, if a local clickstream lifespan distribution is 

enabled, choosing randomly whether or not to end the 

emulation session based upon a most relevant local 

distribution. 

45. ITie medium recited in claim 43, ending the emulation 
session comprising: 

if an emulation session length has reached the global 
maximum, ending the session. 

46. The medium recited in claim 44, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream lifespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 
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if an emulation session length has reached the global 
maximum, ending the session. 

47. A computer-driven system to emulated behavior of 
web site visitors for producing web site trend analysis data, 

5 the system comprising: 
a storage; 
a processor; 

circuitry communicatively coupling the storage to the 
processor, the processor being capable of assisting in 
the emulation of web site visitor behavior by: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by emu- 
lated visitors during an emulated traversal of a web site 
J 5 and selecting a subset of a distribution to be emulated; 

creating an emulated distribution including an entry page 
distribution, the emulated distribution emulating distri- 
bution and transition probabilities for selected actions 
of an emulated visitor; 
20 speciyfing a maximum clickstream length; and 

storing the emulated distributions; 

randomly selecting a number of emulated visitors from 
the emulated distribution; 
25 traversing the web site using the randomly selected emu- 
lated visitors; and 

ending the emulation session. 

48. A computer-driven system to emulated behavior of 
web site visitors for producing web site trend analysis data, 

30 the system comprising: 
a storage; 
a processor; 

circuitry communicatively coupling the storage to the 
2^ processor, the processor being capable of assisting in 
the emulation of web site visitor behavior by: 
initializing an emulated disU'ibution, the emulated distri- 
bution having data reflecting decisions made by emu- 
lated visitors during an emulated traversal of a web site; 
40 said emulated traversal of the web site by a visitor 
comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximimi clickstream length by randomly 
45 selecting a clickstream length from a clickstream 

lifespan distribution if the clickstream lifespan dis- 
tribution is enabled; 
entering the web site at the selected entry page; and 
traversing the web site; 
50 randomly selecting a number of emulated visitors from 
the emulated distribution; 
traversing the web site using the randomly selected emu- 
lated visitors; and ending the emulation session. 

49. The system recited in claim 48, traversing the web site 
comprising: 

generating a list of candidate links, a candidate link being 
a link choice available to a visitor on a page of the web 
site; 

selecting a candidate link from the list; and 
traversing the candidate link. 

50. ITie system recited in claim 49, the method further 
comprising selecting only candidate links that are allowable 
links. 

55 51 . The system recited in claim 50, traversing a candidate 
link comprising: 
enabling link type preference distribution; 
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sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 

weighing candidate links by a uniform distribution where 
each candidate is equally as likely; 

selecting allowable candidate links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 
allowable weighed candidate links. 

52. A computer-driven system to emulated behavior of 
web site visitors for producing web site trend analysis data, 
the system comprising: 

a storage; 
a processor; 

circuitry communicatively coupling the storage to the 
processor, the processor being capable of assisting in 
the emulation of web site visitor behavior by: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by emu- 
lated visitors during an emulated traversal of a web site; 

randomly selecting a number of emulated visitors from 
the emulated distribution by: 

selecting at random an enU'y page from an enUy page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan dis- 
tribution is enabled; 

entering the web site at the selected entry page; and 

traversing the web site comprising generating a list of 
candidate links, a candidate link being a link choice 
available to a visitor on a page of the web site; 

selecting a candidate link from the list and selecting 
only candidate links that are allowable links; and 

traversing the candidate link comprising enabling page 
preference distribution; 

retrieving a link preference distribution for a current 
page; 

weighing each candidate link using link preference 
distribution for the current page, and ignoring any 
candidate link with a specified weight; and 
selecting at random an allowable candidate link from 
the allowable weighted candidate links; 
traversing the web site using the randomly selected emu- 
lated visitors; and 
ending the emulation session, 

53. The system recited in claim 50, traversing a candidate 
link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 
erence distribution; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 

54. The system recited in claim 50, traversing a candidate 
link comprising: 

if link type preference distribution is enabled, then: 
sorting candidate links by type; 
weighing each candidate link using link preference 

distribution by link type, and ignoring any candidate 

link with a specified weight; 
weighing candidate links by a uniform distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed 

candidate links; and 
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selecting at random an allowable candidate link from 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference distribution for a current 

page is selected, then: 
weighing each candidate link using link preference 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links. 

55. The system recited in claim 51 ending the emulation 
session comprising: 

ending the emulation session if all allowable weighted 
link candidates have been traversed. 

56. The system recited in claim 52, ending the emulation 
session comprising: 

if the clickstream lifespan distribution is enabled, ending 

the emulation session; 
otherwise, if a local clickstream lifespan distribution is 

enabled, choosing randomly whether or not to end the 

emulation session based upon a most relevant local 

distribution. 

57. The system recited in claim 53, ending the emulation 
session comprising: 

if an emulation session length has reached the global 
maximum, ending the session. 

58. ThG system recited in claim 54, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream lifespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 

if an emulation session length has reached the global 
maximum, ending the session. 

59. The system of claim 47, the emulated traversal of a 
web site by a visitor comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan distri- 
bution is enabled; 

entering the web site at the selected entry page; and 

traversing the web site. 

60. The system recited in claim 59, traversing a web site 
comprising: 

generating a list of candidate links, where a candidate link 

is a link choice available on a page of the web site; 
selecting a candidate link from the list; and 
traversing the candidate link. 

61. The system recited in claim 60, the method further 
comprising selecting only candidate links that are allowable 
hnks. 

62. ITie system recited in claim 61, traversing a candidate 
link comprising: 
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enabling link type preference disiribuiion; 
sorting candidate links by type; 

weighing each candidate link using link preference dis- 
tribution by link type, and ignoring any candidate link 
with a specified weight; 5 

weighing candidate links by a uniform distribution where 
each candidate is equally as likely; 

selecting allowable candidate links from the weighed 
candidate links; and 

selecting at random an allowable candidate link from the 10 
allowable weighed candidate links. 

63. The system recited in claim 61, traversing a candidate 
link comprising: 

enabling page preference distribution; 

retrieving a link preference dtstributioo for a current page; 15 

weighing each candidate link using link preference dis- 
tribution for the current page, and ignoring any candi- 
date link with a specified weight; and 

selecting at random an allowable candidate link from the 
allowable weighted candidate links. 20 

64. The system recited in claim 61, traversing a candidate 
link comprising: 

enabling global link preference distribution; 

sorting candidate links by position on a page; 

weighing each candidate link using the global link pref- 25 

erence distribution; and 
selecting at random an allowable candidate link from the 

allowable weighted candidate links. 

65. The system recited in claim 61, traversing a candidate 
link comprising: 30 

if link type preference distribution is enabled, then: 
sorting candidate links by type; 
weighing each candidate link using link preference 

distribution by link type, and ignoring any candidate 

link with a specified weight; 35 
weighing candidate links by a uniform distribution 

where each candidate is equally as likely; 
selecting allowable candidate links from the weighed 

candidate links; and 
selecting at random an allowable candidate link from 40 

the allowable weighed candidate links; 
if page preference distribution is enabled, then: 

retrieving a link preference distribution for a cunrent 

page is selected, then: 
weighing each candidate link using link preference 45 

distribution for the current page, and ignoring any 

candidate link with a specified weight; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links; 
if global link preference distribution is enabled, then: 50 
sorting candidate links by position on a page; 
weighing each candidate link using the global link 

preference distribution; and 
selecting at random an allowable candidate link from 

the allowable weighted candidate links. 5S 

66. The system recited in claim 62, ending the emulation 
session comprising: 

ending the emulation session if all allowable weighted 
link candidates have been traversed. 

67. The system recited in claim 63, ending the emulation 60 
session comprising: 

if the clickstrcam lifespan distribution is enabled, ending 

the emulation session; 
otherwise, if a local clickstream lifespan distribution is 

enabled, choosing randomly whether or not to end the 65 

emulation session based upon a most relevant local 

distribution. 
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68. The system recited in claim 64, ending the emulation 
session comprising: 

if an emulation session length has reached the global 
maximum, ending the session. 

69. ITie system recited in claim 65, ending the emulation 
session comprising: 

if a link preference distribution by link type is enabled, 
ending the emulation session if all allowable weighted 
link candidates have been traversed; 

if the clickstream lifespan distribution is enabled, ending 
the emulation session; 

if a local clickstream lifespan distribution is enabled, 
choosing randomly whether or not to end the emulation 
session based upon a most relevant local distribution; 
and 

if an emulation session length has reached the global 
maximum, ending the session. 

70. An apparatus for emulating behavior of web site 
visitors for producing web site trend analysis data, the 
apparatus comprising: 

storage means for storing data; 

a processing means for processing data, the processing 
means assisting in the emulation of web site visitor 
behavior by: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by emu- 
lated visitors during an emulated traversal of a web site 
and selecting a subset of a distribution to be emulated; 

creating an emulated distribution including an entry page 
distribution, the emulated distribution emulating distri- 
bution and transition probabilities for selected actions 
of an emulated visitor; 

specifying a maximum clickstream length; and 

storing the emulated distributions; 

randomly selecting a number of emulated visitors from 
the emulated distribution; 

traversing the web site using the randomly selected emu- 
lated visitors; and ending the emulation session. 

71. An apparatus for emulating behavior of web site 
visitors for producing web site trend analysis data, the 
apparatus comprising: 

storage means for storing data; 

a processing means for processing data, the processing 
means assisting in the emulation of web site visitor 
behavior by: 

initializing an emulated distribution, the emulated distri- 
bution having data reflecting decisions made by emu- 
lated visitors during an emulated traversal of a web site, 
said emulated traversal of the web site by a visitor 
comprising: 

selecting at random an entry page from an entry page 
distribution; 

specifying a maximum clickstream length by randomly 
selecting a clickstream length from a clickstream 
lifespan distribution if the clickstream lifespan dis- 
tribution is enabled; 
entering the web site at the selected entry page; and 
traveling the web site; 
randomly selecting a number of emulated visitors from 

the emulated distribution; 
traversing the web site using the randomly selected emu- 
lated visitors; and 
ending the emulation session. 

***** 
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